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ABSTRACT 

In this paper, a visual language, VCP, for queries on complex- 
value databases is proposed. The main strength of the new 
language is that it is purely visual: (i) It has no notion 
of variable, quantification, partiality, join, pattern match- 
ing, regular expression, recursion, or any other construct 
proper to logical, functional, or other database query lan- 
guages and (ii) has a very natural, strong, and intuitive de- 
sign metaphor. The main operation is that of copying and 
pasting in a schema tree. 

We show that despite its simplicity, VCP precisely cap- 
tures complex- value algebra without powerset, or equiva- 
lently, monad algebra with union and difference. Thus, its 
expressive power is precisely that of the language that is 
usually considered to play the role of relational algebra for 
complex-value databases. 

1. INTRODUCTION 

Even though most modern database query languages are 
based on logical or algebraic foundations (or a combination 
of these, as is the case for SQL) to allow for the declarative or 
at least abstract specification of queries, inexperienced users 
are often overwhelmed by the task of writing queries. This 
has motivated efforts to develop easy-to-use visual query 
languages. 

The essential issue in defining good visual query languages 
is to provide strong visual metaphors for the constructs in a 
query and the steps to be taken to define it. Only by such 
strong visual metaphors can a visual language become easy 
to use for inexperienced users. Examples of such metaphors 
for data management operations in the wider sense are, e.g., 
deletion by dragging an icon representing a data object onto 
a garbage can or data relocation by dragging and dropping 
icons in a directory tree. 

We distinguish between (1) graphical (or graph-based) 
query languages such as QBE |19| and Graphlog |3j and 
(2) visual languages in which the specification of the query 
is a process, i.e., the query is defined by the interaction with 



the user rather than by the static graphical outcome. 1 For 
example, manipulating a directory hierarchy in an operat- 
ing system window manager usually involves a sequence of 
insertion, copy, and deletion steps which together define the 
transformation on the file system to be performed. By just 
considering the outcome - a directory structure - it is not 
possible to tell which transformation was carried out. 

The interaction of a user with a system may give a sur- 
prising amount of expressive power while allowing the visual 
approach to remain intuitive. Static graphical languages 
tend to offer more powerful constructs (such as variables and 
quantifiers) to compensate for this. Even if such constructs 
are provided via graphical objects, queries still closely cor- 
respond to traditional textual query languages, and much of 
the appeal of visual specification is lost. 

QBE 19 is a visual query language for relational databases 
and an attractive alternative to relational algebra. In its 
original form, it is widely taught but has failed to have a 
large impact on database practice. One reason for this may 
be its reliance on variables, which are required to perform 
joins, and may be a concept hard to deal with for non-expert 
users. One seemingly close cousin of QBE, the visual lan- 
guage employed in Microsoft Access, avoids the use of vari- 
ables by replacing them by lines that connect table columns 
to be joined. Still, lines are an unsuitable metaphor for 
multi-argument and multi-way joins. The otherwise intu- 
itive design metaphor of QBE (and equally the visual query 
language of Access) also breaks when one wants to go be- 
yond conjunctive queries to employ union and difference in 
queries. 

Even greater problems are encountered when a visual lan- 
guage is sought for nested relational or complex-value data- 
bases \H\ □ 0. QBE-like tables can be visually nested 
within each other easily, but there seems to be no clear - and 
sufficiently expressive - semantics to multiple occurrences 
of variables (which give us joins in QBE). Such a semantics 
would need to be able to express functionality beyond joins 
such as nesting and unnesting as well. Previous work on 
languages for complex-value databases has therefore either 
resulted in expressively rather weak languages or has com- 
promised simplicity (the existence of a strong visual design 
metaphor) to obtain the right expressiveness. 

Besides QBE, another example of a language of the first 
kind is visXCerpt [Bj, a GUI-based query language for XML 
that uses graphical primitives for notions such as variables, 
quantification, recursion, and partiality. The expressive po- 



1 Languages of the first class are traditionally called visual 
as well. 



wer of visXCerpt has not been formally studied, but appears 
to be very high (Turing-complete). QURSED |14| . another 
GUI-based XML query language, trades in expressiveness 
for simplicity (even though no formal study of its expressive 
power is available). There are no explicit graphical objects 
for variables, but the user is required to have a notion of 
concepts such as variables and Boolean combinations of con- 
ditions in order to successfully use the query builder. Both 
the visXCerpt and the QURSED tool use a number of visual 
operations to build queries (such as dragging and dropping 
data locations); however, the query is the static outcome of 
this process. 

XQBE |2] and XML-GL ,8! are graphical languages for 
queries on XML. In both, there is no notion of explicit vari- 
ables; instead lines are drawn similarly to the visual tool of 
MS Access. A query consists of a source- and a construct- 
part (both based on graphical schema representations). No 
study of the expressive power of these languages is available, 
but the queries seem to correspond to the nested (XML) 
analogs of restricted classes of conjunctive queries. 

In Graphlog UJ, queries are graphs with node and edge an- 
notations, which are to be matched against a graph database 
(in the spirit of finding subgraph homomorphisms) . Graphlog 
handles a class of linear recursive queries by allowing for cer- 
tain regular expressions over relation names on edges, where 
such relations can be defined by Graphlog query graphs and 
used cyclically. When recursion is used sparingly, Graphlog 
queries tend to be easy to read. The expressive power of 
Graphlog is studied formally in and it turns out that 
the language has a number of nice characterizations. How- 
ever, only binary relations can be defined, so no general data 
transformations are possible and the expressiveness does not 
match that required for complex-value databases. Similar 
approaches are taken in G-Log [15) and in GraphLog's pre- 
decessor G+ )1U). 

Lixto 0, a visual Web wrapper generator, is based on 
an intuitive metaphor for information extraction from Web 
pages. (The selection of regions in a Web page with the 
mouse and the association of "patterns" to such regions.) 
However, Lixto only covers unary (tree node-selecting) que- 
ries rather than data transformation queries, which is suffi- 
cient in the wrapping context. The core language of Lixto 
has been shown to capture precisely an important and well- 
studied class of queries, those definable in monadic second- 
order logic over trees )11): however, to achieve this, Lixto 
has to resort to recursion jlj, for which no visual metaphor 
is provided. 

To this day, there is no truly visual language that captures 
the expressiveness of any of the clean theoretically-founded 
data transformation languages such as relational algebra or 
complex- value algebra Q]. This paper aims to improve 
on this situation by proposing a language that appears to 
satisfy these desiderata. Our contributions are as follows. 

• A visual language, VCP, for queries on complex-value 
databases is proposed. The main strength of the new 
language is that it is purely visual: (i) It has no no- 
tion of variable, quantification, partiality, join, pat- 
tern matching, regular expression, recursion, or any 
other construct proper to logical, functional, or other 
database query languages and (ii) has a very natural, 
strong, and intuitive design metaphor. 

The only operations available and needed in VCP are 



that of copying and pasting in a schema tree as well as 
inserting, renaming, and deleting nodes and filtering 
sets. A schema tree only uses three kinds of nodes, 
namely set-typed and tuple-typed nodes and atomic 
value leaves. 

The only (oblique) advanced notion that the user has 
to deal with is that of a collection (a set). Schema trees 
can be understood and treated similarly to directory 
trees as they have become commonplace in OS window 
managers (with directories as collections of files). 

• We show that despite its simplicity, VCP precisely cap- 
tures the complex-value algebra without powerset 1 , 
or equivalently, monad algebra with union and differ- 
ence ^1|7|. Thus, its expressive power is precisely that 
of the language that is usually considered to play the 
role of relational algebra for complex-value databases 

(cf. mm). 

These languages are also a foundation of - and prob- 
ably very similar in expressive power too - to XML 
query languages such as XQuery. Even though this 
remains to be verified, this renders it likely that VCP 
may give rise to the very first well-founded visual XML 
query language. 

VCP is a member of the second class of query languages 
defined above - queries are defined as an interactive process. 
It turns out that this interaction gives us so much expres- 
siveness that we need only very simple query constructs and 
operations. 

Example 1.1. We discuss an example VCP query infor- 
mally to provide a first idea of the language. Consider a re- 
lational database with two relations books dsbn , title, year) 
and author s(isbn, name), modeling books and their (possi- 
bly multiple) authors. 

Throughout this paper, we will focus on a complex-value 
data model in which relations may be nested into each other. 
A graphical (tree-based) representation of the schema (a 
schema tree) is shown in Figure 0(a), and detailed def- 
initions will follow in the technical sections of the paper. 
"Dom" denotes the domain of atomic values such as strings 
and integers. A corresponding data tree - modeling two 
books and their authors - is shown in Figure 0(b). 

We will specify a query that maps every complex value 
database of our given schema to a complex value consist- 
ing of a set of book-tuples of the form (isbn, title, authors), 
where "authors" is the set of authors of the book. That is, 
this query nests the authors of each book into their book tu- 
ple. The intended query result and a corresponding schema 
tree can be found in Figures (b) and (a), respectively. 

In VCP, we can define this query visually, by a number 
of interactive modifications of the schema tree. We proceed 
as follows. (1) First we execute a bulk copy operation of 
the "authors" relation into each tuple of the "books" rela- 
tion by simply dragging the "authors" subtree of the schema 
tree onto the node representing the "books" tuples (adding 
another relation- typed attribute/column to the "books" re- 
lation). This is shown in Figure 0(c). 

(2) Then we delete the "year" edge from the schema tree, 
which removes this attribute from "books" and thus the 
years from all book entries in the database. (Because of 
space limitations, not all steps have their own figure; this 
operation can be found in Figure 0(c).) 
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Figure 1: Initial data tree (b) and query steps (a), (c)-(f) of Example [TTT1 



books 




isbn 



Dom Dom 



authors 



Dom 



isbn P 0072465638 




transforms, for each book tuple, the "authors" attribute 
from a set of unary tuples to a set of domain values (au- 
thor names) (Figure (e)). 2 

As stated above, the final schema tree and the query result 
are shown in Figure |5] □ 

Further examples of VCP queries can be found in Fig- 
ures |3 21 an d OH However, these examples fulfill a double 
duty, and serve to demonstrate certain expressiveness argu- 
ments. They are somewhat more abstract. 

The structure of the paper is as follows. First, in Sec- 
tion |5] we introduce types for complex values and their cor- 
responding schema trees. Moreover, we give an introduction 
to monad algebra. Section|3]presents the language VCP and 
gives a number of examples. In Section 21 it is shown that 
VCP precisely captures the expressive power of monad alge- 
bra. We conclude with a discussion of VCP and future work 
(Section [SJ. 
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Figure 2: Conclusion of Example ll.il 



(3) Next we make a bulk copy of the "isbn" attribute of 
each book tuple t to each of the author tuples in the "au- 
thors" attribute of t. This is performed by dragging the 
isbn edge emanating from the node representing the book 
tuples to the node representing the author tuples nested in- 
side books (see Figurc0(d)). In order not to have two "isbn" 
attributes in the nested authors, though, we first rename the 
"isbn" attribute of the nested authors to "isbn2" . 

(4) We will not need the original authors relation anymore, 
so we can remove it from the complex value computed as the 
query result by deleting the top-level authors subtree from 
the schema tree (Figure (d)). 

(5) Now we apply a "select" operation on the schema tree 
node corresponding to the "authors" relation nested inside 
the books and filter out those authors for which isbn 7^ isbn2, 
i.e., for each book tuple t, we remove those author tuples 
that really belong to book t (Figure Q (e)). 

(6) Then we can eliminate the "isbn" and "isbn2" at- 
tributes of the nested author tuples by deleting their sub- 
trees (Figure 0(e)). 

(7) Finally, we eliminate the "authors" tuple node - which 
has only one child (in other terms, a single attribute). This 



2. PRELIMINARIES 
2.1 Schema Trees 

We model complex values constructed from sets, tuples, 
and atomic values from a single-sorted domain 3 in the nor- 
mal way. Types are terms of the grammar 



Dom I {r} I (A 1 



n, 



where k > 0. Type terms have an obvious tree representa- 
tion; we will call such trees schema trees. 

Note that schema trees have three kinds of nodes - tuple, 
set, and atomic value type nodes - and two kinds of edges 
- tuple and set-edges, of which the tuple edges carry an 
attribute name as label. An example of a schema tree for 
the type (7? : {{A :n,B: t 2 )},S : {{C :t 3 ,D: r 4 )}) can be 
found in Figure |3] (a) . If n = T2 = T3 = T4 = Dom, this 
type is an appropriate representation of relational schema 
R(AB), S(CD). (Here and later we model a relational data- 
base as a tuple of relations to get a single complex value.) 

2.2 Monad Algebra 

Consider the query language on complex values consisting 
of expressions built from the following operations (the types 
of the operations are provided as well): 



1. identity 



id 



2. composition 

fog:xh^ g{f{x)) 



p 1 1 

f : t -» t , g : r 



/ o g : t -> t" 

3. constants from Dom U {0, ()} (() is the miliary tuple) 

4. singleton set construction 

sing : x 1— > {a;} r — > r 



2 Visually, this operation cuts out a node from a path in the 
schema tree, rather than deleting the subtree rooted by the 
node eliminated. 

3 All results in this paper immediately generalize to many- 
sorted domains. 



5. application of a function to every member of a set 

map(/) : X -> {f(x) \ x G X} 

f : r — > t' 
map(/) : {r} =J {r'} 

6. flatten: X ^ \J X {{r}} -> {r} 

7. pairing 4 

pairwith^ : (Ai : Xi, Ai : X2, ■ ■ . , A n : x n ) h-» 

{(Ai : asi, A 2 : x 2 , . . . , A n : x n ) \x\tX\} 

{Ax : {n}, A 2 : r 2 , • • • , A„ : r n ) — > 

{(Ai :ti,...,4„: r n )} 

8. tuple formation 

{A 1 :f 1 ,...,A„:fn): 

x^ {Ai : fi(x),...,A n : /«(»)) 

/i : r -> Ti , . . . , fn : T -> r n 

(Ai : /i, . . . , A„ : /„) : r -> <Ai : n, . . . , A n : r») 

9. projection 

-KAi : (Ai : xi, . . . , Ai : Xi, . . . , A n : x n ) h-> 

: (Ai : Ti, . . . , A n : T n ) — ► Tj 

The language has a strong and clean theoretical founda- 
tion from programming language theory, for which we have 
to refer to e.g. |17|. 

Returning to the definition of our operations, note that 
projection is applied to tuples rather than to sets of tuples 
as in relational algebra. For example, the relational algebra 
expression tvab (on some relation with at least columns A 
and B) corresponds to map((A : tva, B : 7Tb)) in M. 

By positive monad algebra Mu, we denote M extended 
by the set union operation. This language has a number of 
nice properties |17ll7|. but it is known that it is incomplete 
as a practical query language because it cannot yet express 
selection, set difference, or set intersection. 

However, if we extend A4u by any nonempty subset of the 
operations selection (of the form oa=b on complex values of 
type {(A : t,B : r', . . . )}, where "=" denotes deep equality 
of complex values), set difference "— ", set intersection n, or 
nesting 5 , we always get the same expressive power. We will 
call any one of these extended languages full monad algebra. 

Theorem 2.1 (|T7|). Mu[<r] = Mu[-\ = M u [n] = 
M\j [nest]. 

Moreover, generalizing selections to test against constants 
or to support "G", "C", or Boolean combinations of con- 
ditions does not increase the expressiveness of full monad 
algebra [T7| . 

4 Operations pairwitliA; can be defined analogously. 
5 The "nest" operation of complex value algebra with- 
out powerset [T] groups tuples by some of their at- 
tributes. For example, nestc=(B)(R) on relation R(AB) 
computes the value {(A: x,C: {(B: y) \ (A: x,B: y) G R}) \ 
(3y){A:x,B:y}£R). 



Example 2.2. The query of Example ll.ll can be phrased 
in A4u[cr] as 

(books : pairwith boofcs o 

m&p({isbn : -Rbooks ° TTisbn, title : Hbooks ° nutle, 
authors : (1 : n books o n isbn , 2 : n au thors)° 
pairwith alltWs o map((is6n : 7Ti, 
isbn2 : 1x2 ° iTi sbn ,name : -K2 o 7r n(lme ))o 

0-isbn = isbn2 ° map(7r na me ) ) ) ) 

That is, we first pair each book tuple with the set of all 
authors. Using the "map" operation, we then process each 
of these pairs as follows. For each book-author pair, wc 
pair the book's isbn number with each of the author tuples. 
Then we are able to select those authors that belong to the 
book (using a). Each book tuple is made a triple of an isbn 
number, a title, and a set of authors. The latter is a set 
of names - rather than tuples of isbn, isbn2, and name - 
created by mapping n nam e onto the set of author tuples. □ 

Theorem 12.11 demonstrates that full monad algebra (say, 
■Mu[c]) is a very robust notion. It can serve as an "expres- 
siveness benchmark" for query languages on complex-value 
databases. Indeed, it has been shown that full monad alge- 
bra is a conservative extension of relational algebra: 6 

Theorem 2.3 A mapping from a (flat) relational 

database to a (flat) relation is expressible in Mu[°~] if and 
only if it is expressible in relational algebra. 

It was discovered (see e.g. |17| for a detailed discussion) 
that virtually all the complex value query languages devel- 
oped in the eighties and nineties that were intended for prac- 
tical use are expressively equivalent. Monad algebra is one of 
them, and so are nested relational algebra I12| and complex 
value algebra without the powerset operation ^ [3]. Thus, 
the expressiveness results obtained in this paper for VCP 
show that this language is equivalent to all of them. 

3. THE VCP LANGUAGE 

In this section, we introduce VCP, a query language that 
transforms complex value databases through a sequence of 
operations on a schema tree. VCP has two dynamic as- 
pects, the first being how VCP operations transform the 
schema tree at the time of specification (discussed in Sec- 
tion !3,H , and the second being the semantics of VCP queries 
on data f Section 13.21 . We provide three larger examples in 
Section 13. 31 

3.1 Query Specification as a Process 

Let nca(?;, e) denote the nearest common ancestor node of 
node v and edge e in the schema tree. 

The operations of the VCP language are presented in Ta- 
ble Operations 8 and 9 are implemented visually by 
drag&drop in the schema tree. The other operations are 
local to a node or edge in the schema tree. 

We have been parsimonious with the number of opera- 
tions that have been introduced in Tabled to make query 
specification faster and more convenient, further visual op- 
erations can be added, such as a "delete subtree" operation 
on set edges (which empties sets). 7 

A generalized version of Theorem IO can be found in |18|. 
7 This operation is redundant with tuple insertion, subtree 





Name & arguments 


Applied to 


Conditions 


Function 


Example 


1 


new constant (A, c) 


tuple node v 




Adds a new attribute A with con- 
stant value c 6 Dom U {0, ()} to v. 


Fig. |41(d) 


2 


insert tuple (A) 


node v 




Replaces the subtree rooted by v 
by {Aw), i.e., by a new tuple- node 
with v as A-child. 


Fig. El (a) 


3 


insert set 


node v 


- 


Replaces the subtree rooted by v by 
{v}, i.e. by a new set-node with v 
as child 




4 


rename (A) 


tuple edge e 




Renames the label of e (= an at- 
tribute of a tuple) to A 


Fig. |4J(b) 


5 


eliminate 


set node v 


The parent node of v is also a set node 


cuts v. let v = {t}. Then, v is 
replaced in the schema tree by r. 


Fig. |3J(e) 


6 


eliminate 


tuple node v 


The subtree rooted by v is of the form 
(A: t), i.e., it has precisely one at- 
tribute 


Replaces « by t. 


Fig.|3J(d) 


7 


delete [subtree] 


tuple edge e 




Deletes e and its subtree; reduces 
arity of tuple by one 


Fig.|4J(f) 


8 


copy to (v) 


tuple edge e 


The label of e must not exist yet in 
the destination tuple node v. The path 
from nca(t!, e) to e must be *-free. 


Adds e and its subtree to tuple node 
v. 


Fig. |4J(c) 


9 


copy to (v) 


set edge e 


The types below e and destination set 
node v must be equal. The path from 
nca(t>, e) to e must contain precisely 
one * (i.e., the label of e). 


Does not modify schema tree. 




10 


select (A, B) 


set node v 


The child of v must be a tuple of type 
{A:t a ,B :t b ,...) 


Does not modify schema tree. 


Fig.|4|(e) 



Table 1: VCP operations. 



3.2 Informal Semantics 

In order to obtain a simple informal semantics for VCP, 
it is convenient to think of complex data values as trees 
themselves. There are two main differences between schema 
trees and data trees. In the former, *-nodes have precisely 
one child, while in the latter, *-nodes may have many (one 
for each of the members of the set). Moreover, leaf nodes of 
schema trees are either labeled "Dom", 0, or (), while data 
trees have atomic values instead of "Dom" at the leaves. 
(See Figure 0a or [5] a for an example of a schema tree and 
FigureQb resp.|2]b for a data tree compatible to the schema 
tree. In order to avoid confusion - but also to be economic 
with space - all data trees in this paper are turned by 90 
degreed compared to schema trees.) 

By the path between two nodes v,w (where v is an an- 
cestor of w), we denote the sequence of edge labels through 
which w is reachable from v. Note that in a schema tree, for 
any node v, each path n uniquely identifies a node reachable 
from v via n. In other words, schema trees are determin- 
istic trees. We can identify nodes of the schema tree with 
their paths from the root node (e.g., we can talk of "the node 
books. *.isbn" in Figure0(a)). This is generally not the case 
for data trees. A given path in the data tree may match a 
set of nodes, e.g., relative to the root node of Figure0(b), 
books.*.year matches "1988" and "2002". 

In the following, we say that a VCP operation is schema- 
local to a subtree of the schema tree if all the visual steps 
necessary to specify the operation can be exclusively taken 
in the subtree. 

There are three kinds of operations in VCP, (a) operations 
on nodes (1, 2, 3, 5, 6, and 10 of Table0, (b) operations on 
edges (4, 7, and 8 of Table0, and (c) copy-paste operations 
(8 and 9 of Table [TJ. 



deletion on tuple edges, addition of constants, and tuple 
elimination. 



By the context node of an operation o, denoted ctx(o), we 
refer to 

• the node of the schema tree that o is applied to if 
o is a node operation (a), with the exception of set 
note elimination where ctx(o) is its parent (a *-node 
as well), 

• the node that the edge emanates from which o is ap- 
plied to if o is an edge operation (b), and 

• the node nca(w, e) for a copy-operation (c) that copies 
from edge e to node w. 

Of course, o is schema- local w.r.t. the subtree of ctx(o). 

The local semantics function L|o] maps from a complex 
value to a complex value, or in other words, from a data tree 
to a data tree: L[select(A=B)](u), where u is a *-node of the 
data tree, removes all those members of set u (= children 
of u with their subtrees) that do not satisfy the selection 
condition A — B. L [eliminate] (u) flattens the set of sets u, 
i.e., replaces the subtree rooted by u by [Ju. For all other 
operations o of types (a) and (b), L\o\ on u is basically the 
same operation as described in Table Q for the schema tree 
applied to node u (or an emanating edge) in the data tree. 

The local semantics of copy-operations (c) is as follows. 
For a copy-paste operation o (type 8 or 9 in Table from 
source edge e to destination node w, let 7r/ rom be the path 
from ctx(o) = nca(w,e) to e and let iito be the destination 
path from ctx(o) to w. If nto contains *-edges, we say that 
o is a bulk-copy operation. If o is on tuples (type 8), TV from 
consists only of tuple-nodes. L[o](u) is obtained from u 
by copying the source u.nf rom to each of the tuple nodes 
matching u.-Kto- If o is on sets (type 9), 7T/ rom consists only 
of tuple-nodes and one set-node s (the node which e directly 
emanates from). L|o](u) is obtained from u by copying each 
of the members of the source-set reachable from u via path 





(a) 



(b) 




eliminate 




(d) 




(e) (f) 
Figure 3: Simulating the cartesian product R x S of fiat relational algebra in VCP. 



TTfrom (excluding the final "*") to each of the destination 
sets matching u.nto- 

If the path 7r from the root of the schema tree to ctx(o) 
contains *-nodes, we call o a bulk operation. Operation o 
is executed by replacing each node u reachable through tv 
from the root of the data tree by L[o](m). 

Example 3.1. Consider again the schema and the data 
tree of Figure (a) and (b), respectively, The operation 



"insert tuple (A)" on the schema tree node books. *.year 
replaces the values of year attributes in tuples reachable 
through path books.* in the data tree by a unary tuple node 
with the year as the value of attribute A. For an edge ma- 
nipulation example, "rename to publyear" on the schema 
tree node books.*. year renames each of the "year" edges of 
tuple nodes reachable through path books.* of the data tree 
to "publyear". □ 
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Figure 4: Modeling difference R — S using selection in VCP. 



(g) 



3.3 Examples 

Figure |21 shows how the cartesian product R x S for two 
binary relations R(A,B), S(C,D) can be encoded in VCP. 
Note that we employ two bulk copy operations (or, here, 
"move" to safe space) to pair the tuples of the two relations. 

The operation "move" is obtained by first copying and 
subsequently deleting the source. It is employed here to 
save space but with considerable headache; a "bulk copy" 
operation is really much more intuitive than a "move to 
many places" operation. 

In Figure 2] we show how difference R — S for unary re- 
lations R(A) and S(A) can be encoded in VCP. The idea 
of the mapping is the same as in Example 12.21 but we have 
gone to the full length of not assuming a selection operation 
of the form aA=(j- We use only a single form of selections, 
(ta=b- This query may also serve as a template for prov- 
ing that adding "-" to VCP does not extend its expressive 
power. 

Of course, difference "-" could also be directly supported 
in a GUI based on VCP to allow to realize this query in 



a single step. Difference "— " may not have the clean vi- 
sual metaphor of the other VCP operations, but it may be 
available as an option for expert users. 

Figure |S] shows how to implement the nest operation of 
complex value algebra without powerset ^ |5j in VCP with 
selection. We encode nestcwm on relation R(AB). 

It it important to assert that each VCP query step is as- 
sumed to consist of a single operation, even though we have 
sometimes resorted to annotating schema trees in Figure |3] 
^] and |^| with several independent operations to safe space. 

4. EQUIVALENCE OF VCP AND MONAD 
ALGEBRA 

In this section, we characterize the expressive power of 
VCP by showing that it captures full monad algebra .Mu[c]- 
We also show that VCP without selection captures pre- 
cisely positive monad algebra and that VCP without the 
copy operation on sets and without selection coincides with 
M. These expressiveness results not only show that VCP 
captures the "benchmark expressiveness" for complex-value 
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Figure 5: Nest c=(s) (R) in VCP. 




databases. The proofs of the direction VCP C 7Wu[f] also 
provide us with a translation from VCP to full monad alge- 
bra that allows us to use existing query engines and results 
on query evaluation for the latter language. The proofs are 
straightforward and provide us with an alternative formal 
semantics for VCP (through the monad algebra framework). 

Theorem 4.1. VCP[wo 'select'] C M u - 

Proof Sketch: The proof is by induction. We define a 
function M [. . .] that maps any VCP query to an equivalent 
.M-expression. 

• If operation o on schema {A\ : n , . . . , A n : t„) is schema- 
local to n, then M[oj = (Ai : n Al ° A4[o@Ai], A 2 : 
7ta 2 , . . . , A n : 7ta„) where o@Ai denotes o modified to 
apply to the A\ branch of the schema tree (i.e., the 
subtree corresponding to n). 

If operation o on schema r = {r'} is schema- local to 
t', then M\o\ — map(A / ([o@*]) where o@* denotes 
the operation o modified to apply to the subtree cor- 
responding to t'. 



For example, if o is an operation schema-local to the 
subtree rooted by node v in the schema tree 




then 

M[oj = (R : tt r o map((A : tt a , 

B : ttb omap(M[(((oli?)@*)lB)l*I), 

C one}), S :tv s ). 

The deletion of subtrees, the renaming of tuple-edges, 
and the addition of new constants to tuples is handled 
by tuple creation, constants, and projection. 

Tuple-node and set-node elimination is encoded using 
projection and "flatten", respectively: 







eliminate 



O 

eliminate 



M\o\ = tta M\o\ = flatten 

A tuple- node with edge "A" is inserted by (A : id). 
A set-node is inserted by "sing" . 

Copy-paste expressions on tuples are mapped to Mu 
as shown for two important cases: 




M\o 



)] = (A: (B: ttb, C: tta ° ire, D: tta ° ttd), B : tt b ) 



source node by a sequence of projections. On the path 
from nca(«, e) to destination node v, there can be a 
sequence of tuple nodes (handled using tuple construc- 
tion as shown in the first example above) and *-nodes, 
handled using "pairwith" as in the second example. 

• Copy-paste expressions on sets are mapped to union. 
For example, 




M\o\ = (A : TYA U TTB, B : 7Tb) 

• A sequence of VCP operations Oi , . . . , On where each 
Oi corresponds to an A'f-expression /; simply corre- 
sponds to composition ((/i o / 2 ) o . . . ) o /„. □ 

The previous translation also yields 

Corollary 4.2. VCP [wo 'copy to set node', 'select'] C 
M. 

For the other direction, 

Theorem 4.3. Mu C VCPfwo 'select']. 

Proof Sketch: The proof employs an inductive definition 
of a function VCP\. . .] that maps Mu expressions to VCP 
queries. Translating most operations is obvious, so we will 
just discuss the most interesting: 



map 




• pairwith. The essential operation here is copy-paste. 
We implement pairwith A in VCP as follows. 



M\o\ — {A : pairwith 1 (7TA, ttb) omap((£> : TT2, 

C : tti o ttc,D : 7Ti o ttd}),B : ttb) 

In general, the path from nca(i>, e) to e can consist of of 
sequence of tuple edges, from which be can extract the 





• tuple creation. We rewrite each .M-expression of the 
form (Ai : fi, . . . , A n : f n ) into 

(Ai : id,..., A n : id)o 

(Al : WAi ° fl, - ■ - , A n ■ TTA n ° fn}- 

Now, {Ai : id, . . . , A n : id) simply means to make a tu- 
ple of n copies of the input value which can be realized 
in VCP as shown in Figure (for n = 3). 

The second step, (Ai : tva 1 o fi,...,A n : 7ta„ ° fn), just 
requires to push the fi down into the Ai branches, for 
each 1 < i < n. That is, we transform 



VCPiiAi : ir Al ° h,...,A n : tta„ o /„>] 




• union is implemented using the copy operation be- 
tween sets. 

• constants. 

insert tuple(A) add constant c via B 



A 




delete 




□ 



Corollary 4.4. M C VCP [wo 'copy to set node', 'se- 
lect']. 

The same selection operation is available in VCP and 
.A/!u[0"]- From the previous results, we obtain 

Theorem 4.5. VCP = Mu[o]. 

5. DISCUSSION AND CONCLUSIONS 

In our presentation, we have implicitly assumed set-typed 
collections, but at least for positive monad algebra M u and 
for VCP [wo selection], sets can be replaced by bags or lists 
and we immediately obtain that A4u = VCP[wo selection] 
still holds. For the practical viewpoint, this implies that 
VCP can be used for visually specifying bag queries (as in 
SQL), but it also exhibits one oblique notion that users have 
to understand in order to successfully use the VCP language, 
namely that of collections. Under the set-interpretation, 
dragging a *-edge onto a *-node results in the addition of 
the members of the source sets to the destination set with du- 
plicate elimination, while for bags, duplicates are not elim- 
inated, and for lists, this operation means to append the 
source list to the destination list. 

Regarding full monad algebra, there is unfortunately no 
agreed-upon notion of intersection or difference for bags and 
lists (several alternatives can be reasonably justified). More- 
over, not all of these notions lead to the same expressive 
power when added to Mu- For example, the probably most 
natural notion of difference on bags usually referred to as 
monus is known to yield the power of arithmetics, while 
this is not the case for intersection or selection |13|. 

Still, any of these differing "full" bag or list monad alge- 
bras can be simulated in VCP by just adding the desired op- 
erations (such as selection or difference) directly. Of course 
this is a pragmatic solution, but actually not more daring 
than what we have done in the set case earlier in the paper: 
positive monad algebra was realized using just insertion, re- 
naming, deletion, and copying, while the move to the ex- 
pressiveness of full monad algebra was made by adding a se- 
lection operation. The choice of which operations are made 
available in the language (in the case of sets, selection, inter- 
section, and difference are interchangeable and even adding 
them all does not yield greater expressiveness) should de- 
pend on which operations appear natural given the design 
choices made for the GUI of the query editor. 

One future area of research will be to provide a visual 
language for defining XML queries on the basis of VCP. Or- 
dered, unranked XML trees can be viewed as nested lists, 
but to get a visual language for XML with the expressive 
power of (full) monad algebra on lists, also tuple- typed nodes 
are required. These do not exist in XML, but could be sim- 
ulated using the XPath position() function; that is, the chil- 
dren of pseudo-tuple nodes in XML would only be accessed 
using paths of the form "child[position() = if, yielding the 
i-the attribute of the pseudo-tuple. 





Figure 6: VCP[{Ai : id,A 2 : id,A 3 : id)]. 



Schema information for XML is usually provided in the 
form of Document Type Definitions or XML Schemas, which 
correspond to possibly infinite schema trees (through recur- 
sion). However, this is not a problem in VCP or VCP ex- 
tensions that correspond to XQueries that use no transitive 
axes. In VCP, each query only navigates into a schema tree 
up to a certain depth fixed with the query. Further work 
will be required for VCP to deal with navigation to the de- 
scendants of XML nodes. 

VCP - in an extended form - seems to be a promis- 
ing candidate for a bridging formalism between (relational) 
databases, file systems, and (XML or HTML) data trees. 
Such a bridge could lead to greatly enhanced usability of 
visual tools for Web site and Web service definition. This is 
one possible direction of future research. 
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